Progress of Speech Recognition using the Corpus of Spontaneous Japanese (CSJ)

نویسنده

  • Tatsuya Kawahara
چکیده

The report gives an overview of the current state of spontaneous speech recognition using the “Corpus of Spontaneous Japanese (CSJ)”. It is shown that the large-scale corpus had strong impact in training acoustic and language models considering morphological and pronunciation variations which are characteristic to spontaneous Japanese. Unsupervised adaptation of these models and the speaking rate is also effective, and we have achieved word accuracy of 78.0%, which is a significant improvement over a couple of years.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus of Spontaneous Japanese: Its Design and Evaluation

Corpus of Spontaneous Japanese, or CSJ, is a large-scale database of spontaneous Japanese. It contains speech signal and transcription of about 7 million words along with various annotations like POS and phonetic labels. After describing its design issues, preliminary evaluation of the CSJ was presented. The results suggest strongly the usefulness of the CSJ as the resource for the study of spo...

متن کامل

Training a Language Model Using Webdata for Large Vocabulary Japanese Spontaneous Speech Recognition

This paper describes a language modeling method using largescale spoken language data retrieved from the Web for spontaneous speech recognition. We downloaded 15 million Web pages on a comprehensive range topics. Next, spoken languagelike texts were selected from the downloaded Web data using the naı̈ve Bayes classifier, and typical linguistic phenomena such as fillers and pauses were added usin...

متن کامل

Dependency-structure Annotation to Corpus of Spontaneous Japanese

In Japanese, syntactic structure of a sentence is generally represented by the relationship between phrasal units, or bunsetsus in Japanese, based on a dependency grammar. In the same way, the syntactic structure of a sentence in a large, spontaneous, Japanese-speech corpus, the Corpus of Spontaneous Japanese (CSJ), is represented by dependency relationships between bunsetsus. This paper descri...

متن کامل

A Corpus-based Analysis on Prosody and Discourse Structure in Japanese Spontaneous Monologues

The aim of this paper is two folds. First, the paper attempts to investigate prosody and discourse structure in Japanese spontaneous monologues by using the prosodic labels of the Corpus of Spontaneous Japanese (CSJ). The analyses of F0 peak trends and prosodic breaks confirmed previous findings in [1]. Secondly, the paper attempts to evaluate the validity of prosodic labels of the X-JToBI syst...

متن کامل

Benchmark Test for Speech Recognition Using the Corpus of Spontaneous Japanese

We present benchmark results of automatic speech recognition using the Corpus of Spontaneous Japanese (CSJ), which has been developed in the five-year national project and will be the largest spontaneous speech databases. New test-sets are designed for both academic presentation speech and extemporaneous public speech, which are the two major categories in the corpus. The testsets are selected ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004